Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

IDa-Det: An Information Discrepancy-Aware Distillation for 1-bit Detectors

177

(a) Eﬀect of μ.

(b) Eﬀect of λ and γ.

FIGURE 6.17

On VOC, we (a) select μ on the raw detector and diﬀerent KD methods including Hint [33],

FGFI [235], and IDa-Det; (b) select λ and γ on IDa-Det with μ set as 1e−4.

by 2.5%, 2.4%, and 1.8% compared to non-distillation, Hint and FGFI, under the same

student-teacher framework. Then we evaluate the proposed entropy distillation loss against

the conventional ℓ2 loss, the loss of the inner product and the loss of cosine similarity. As

depicted in Table 6.5, our entropy distillation loss improves the distillation performance by

0.4%, 0.3%, and 0.4% with the Hint, FGFI, and IDa method compared with ℓ2 loss. Com-

pared to the loss of the inner product and cosine similarity, the loss of entropy outperforms

them by 2.1% and 0.5% in mAP in our framework, which further reﬂects the eﬀectiveness

of our method.

TABLE 6.5

The eﬀects of diﬀerent components in IDa-Det with Faster-RCNN

model on PASCAL VOC dataset.

Model

Proposal selection

Distillation method

mAP

Res18

78.6

BiRes18

74.0

Res101-BiRes18

Hint

ℓ2

74.1

Res101-BiRes18

Hint

Entropy loss

74.5

Res101-BiRes18

FGFI

ℓ2

74.7

Res101-BiRes18

FGFI

Entropy loss

75.0

Res101-BiRes18

IDa

Inner-product

74.8

Res101-BiRes18

IDa

Cosine similarity

76.4

Res101-BiRes18

IDa

ℓ2

76.5

Res101-BiRes18

IDa

Entropy loss

76.9

Note: Hint [33] and FGFI[235] are used to compare with our information discrepancy-aware

proposal selection (IDa). IDa and Entropy loss denote main components of the proposed

IDa-Det.